02. Navigation with Deep RL

Navigation with Deep RL

Autonomous navigation through complex environments has been demonstrated with various SLAM (Simultaneous Localization and Mapping) algorithms, where position inference is combined with mapping. Recent advances in deep neural networks and reinforcement learning (RL) have opened the door to solving navigation with deep RL. As with the other deep RL solutions we’ve discussed, the basic idea is to use visual inputs and rewards to train a neural network to output correct actions for robotic navigation to a goal. In reality, there are a number of challenges in implementation.

Mirowski paper

The 2017 paper, “Learning to navigate in complex environments” by Piotr Mirowski, provides an approach to the problem of navigation in complex environments using a deep RL solution. The approach is tested in visually rich simulated 3-D game-like maze environments and provides an analysis of results that compare favorably to human behavior for the same mazes. The paper addresses the following challenges in navigation using RL:

  • Rewards are often sparsely distributed in an environment
  • Environments often include dynamic elements

The approach in the Mirowski paper addresses the problems of sparse rewards and dynamic elements by incorporating multiple objectives and using a number of Deep RL algorithmic improvements over the simple DQN algorithm implemented in our earlier lessons.

Auxiliary objectives

In addition to the primary objective of maximizing cumulative rewards, as usual, there are a couple of auxiliary objectives as well. These are to infer depth estimates from RGB observations and to detect loop closures in mapping. By training for these auxiliary objectives, the agent learns information that it also needs for object avoidance and path planning. As explained in the Deep Mind paper on auxiliary tasks in Deep RL, this is “similar to how a baby might learn to control their hands by moving them and observing the movements”.

Algorithm and Architecture

The RL problem in the paper is addressed using the Asynchronous Advantage Actor-Critic(A3C) algorithm. This algorithm eclipses DQN regarding efficiency. To get an intuition into how actor-critic methods work, see the overview provided in the next concept. Actor-critic algorithms require both a policy, \pi ,and value function prediction, V, from the network.

The network architecture also includes memory in the form of LSTM (Long Short Term Memory) layers, which add the capability of learning long-term dependencies. The diagram below gives an overview of the deep RL neural net architecture described by Mirowski. Notice that the auxiliary depth predictions are provided at two locations, the convolutional (D1) and LSTM (D2) layers.

Results

The Mirowski paper provides an analysis of navigation success using various permutations of the A3C algorithm, LSTM memory, and auxiliary objectives. These are then compared against human behavior as well. The conclusion is that the best combination for most of the mazes is to use the auxiliary objectives connected to the neural network, using the A3C algorithm and including LSTM memory. The results are very close to human behavior.

What does this tell us?

The research in this paper provides a deep RL architecture that can be used for navigation, tested in a robotic simulation environment. As practitioners, It gives us a place to start if we want to improve upon the more simple, but limited, DQN architecture from earlier examples.

Additional Resources

You’re encouraged to read the papers cited here on your own and learn as much as you can from them. The technology in this field is changing rapidly, with breakthroughs appearing in the research realm. Within a few years, some of these may be implemented by commercial ventures. The more you know about what’s coming, the better prepared you will be to use it yourself in your own project or in the employ of an innovative company!